关于自适应梯度方法等自适应梯度方法等训练动力的知之甚少。在本文中,我们阐明了这些算法在全批处理和足够大的批处理设置中的行为。具体而言,我们从经验上证明,在全批训练中,预处理的Hessian的最大特征值通常在某个数值下平衡 - 梯度下降算法的稳定性阈值。对于带有步长$ \ eta $和$ \ beta_1 = 0.9 $的Adam,此稳定性阈值为$ 38/\ eta $。在Minibatch培训期间发生了类似的影响,尤其是随着批处理大小的增长。然而,即使自适应方法在``稳定性的自适应边缘''(AEOS)上训练,但它们在该制度中的行为与EOS的非自适应方法的行为有很大不同。 EOS处的非自适应算法被阻止进入损失景观的高曲率区域,而AEOS的自适应梯度方法可以继续前进到高外观区域,同时适应预先调节器以补偿。我们的发现可以成为社区对深度学习中适应性梯度方法的未来理解的基础。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the 2 norm. This "randomized smoothing" technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in 2 norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with 2 norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified 2 robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http: //github.com/locuslab/smoothing.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
When used in complex engineered systems, such as communication networks, artificial intelligence (AI) models should be not only as accurate as possible, but also well calibrated. A well-calibrated AI model is one that can reliably quantify the uncertainty of its decisions, assigning high confidence levels to decisions that are likely to be correct and low confidence levels to decisions that are likely to be erroneous. This paper investigates the application of conformal prediction as a general framework to obtain AI models that produce decisions with formal calibration guarantees. Conformal prediction transforms probabilistic predictors into set predictors that are guaranteed to contain the correct answer with a probability chosen by the designer. Such formal calibration guarantees hold irrespective of the true, unknown, distribution underlying the generation of the variables of interest, and can be defined in terms of ensemble or time-averaged probabilities. In this paper, conformal prediction is applied for the first time to the design of AI for communication systems in conjunction to both frequentist and Bayesian learning, focusing on demodulation, modulation classification, and channel prediction.
translated by 谷歌翻译
我们引入了一种新的文化学习范式,以测量在推理过程中学习新颖单词的大型语言模型(LLMS)。特别是,我们通过用一个合成但合理的词代替关键概念词来重写Winograd风格的共同参考分辨率问题,该词必须理解该模型以完成任务。解决此任务需要模型来利用提示中给出的新单词的字典定义。这个基准介绍了单词获取,这是折磨llms已知的历时降解的一个重要方面。由于LLM在训练的那一刻及时被冻结,因此通常无法反映语言随着时间的变化方式。我们表明,与原始Winograd任务相比,LLM的准确性在我们的基准测试中从根本上降低,从而确定了当前模型的局限性,并提供了基准来衡量LLMS的未来改善LLMS进行内在学习的能力。
translated by 谷歌翻译
学习在无人驾驶汽车(UAV)捕获的图像中检测物体(例如人类)通常会遭受无人机对物体的位置造成的巨大变化。此外,现有的基于无人机的基准数据集不提供足够的数据集元数据,这对于精确的模型诊断至关重要,并且学习功能不变。在本文中,我们介绍了大天使,这是第一个基于无人机的对象检测数据集,该数据集由具有相似想象条件以及无人机位置以及对象姿势元数据捕获的真实和合成子集组成。一系列实验经过精心设计,使用最先进的对象检测器设计,以证明在模型评估过程中利用元数据的好处。此外,还提供了几种涉及模型微调过程中涉及真实和合成数据的关键见解。最后,我们讨论了有关大天使的优势,局限性和未来方向,以突出其对更广泛的机器学习社区的独特价值。
translated by 谷歌翻译
通过一系列联邦举措和命令,美国政府一直在努力确保美国在AI中的领导。这些广泛的战略文件影响了美国空军美国部(DAF)等组织。DAF-MIT AI加速器是DAF和MIT之间的一项计划,以弥合AI研究人员与DAF任务要求之间的差距。DAF-MIT AI加速器支持的几个项目正在开发公共挑战问题,这些问题解决了许多联邦AI研究的重点。这些挑战是通过公开可用的大型AI-Ready数据集,激励开源解决方案,并为可以激发进一步研究的双重使用技术创建需求信号,来针对优先事项。在本文中,我们描述了正在开发的这些公共挑战以及它们的应用如何促进科学进步。
translated by 谷歌翻译
某些培训干预措施(例如提高学习率和应用批归归式化)的机制提高了深网的概括仍然是一个谜。先前的作品猜测,“扁平”解决方案比“更清晰”的解决方案更好地概括了看不见的数据,激发了几个指标来测量平坦度(尤其是损失Hessian最大的特征值);和算法,例如清晰度最小化(SAM)[1],它们直接优化了平坦度。其他作品质疑$ \ lambda_ {max} $与概括之间的链接。在本文中,我们提出了调用$ \ lambda_ {max} $对概括的影响的发现。我们表明:(1)虽然较大的学习率减少了所有批量尺寸的$ \ lambda_ {max} $,但概括益处有时会在较大的批量尺寸下消失; (2)通过同时缩放批量的大小和学习率,我们可以更改$ \ lambda_ {max} $,而不会影响概括; (3)虽然SAM生产较小的$ \ lambda_ {max} $,用于所有批次尺寸,概括益处(也)消失,较大的批量尺寸; (4)对于辍学,过高的辍学概率可能会降低概括,即使它们促进了较小的$ \ lambda_ {max} $; (5)虽然批处理范围并未始终产生较小的$ \ lambda_ {max} $,但它仍然赋予概括性优势。尽管我们的实验肯定了大型学习率和SAM对Minibatch SGD的概括优势,但GD-SGD差异证明了对$ \ lambda_ {Max} $解释神经网络中概括的能力的限制。
translated by 谷歌翻译
因果表示学习是识别基本因果变量及其从高维观察(例如图像)中的关系的任务。最近的工作表明,可以从观测的时间序列中重建因果变量,假设它们之间没有瞬时因果关系。但是,在实际应用中,我们的测量或帧速率可能比许多因果效应要慢。这有效地产生了“瞬时”效果,并使以前的可识别性结果无效。为了解决这个问题,我们提出了ICITRI,这是一种因果表示学习方法,当具有已知干预目标的完美干预措施时,可以在时间序列中处理瞬时效应。 Icitris从时间观察中识别因果因素,同时使用可区分的因果发现方法来学习其因果图。在三个视频数据集的实验中,Icitris准确地识别了因果因素及其因果图。
translated by 谷歌翻译